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Abstract 

Various metrics for comparing diffusion tensors have been recently proposed in the liter- 
ature. We consider a broad family of metrics which is indexed by a single power parameter. 
A likelihood-based procedure is developed for choosing the most appropriate metric from 
the family for a given dataset at hand. The approach is analogous to using the Box-Cox 
transformation that is frequently investigated in regression analysis. The methodology is 
illustrated with a simulation study and an application to a real dataset of diffusion tensor 
images of canine hearts. 
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1 Introduction 

In many applications it is of interest to consider statistical analysis of covariance matrices. In 
particular, typical tasks might be to estimate a mean covariance matrix, describe the structure 
of the variability in the matrices, and interpolate between covariance matrices. A key aspect of 
such analysis is the choice of metric for comparing covariance matrices. 

In medical imaging a particular type of data structure commonly arises when analysing diffu- 
sion weighted images, called the diffusion tensor (Basser et al., 1994; Alexander, 2005). The 
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diffusion tensor (DT) is simply a positive semi-definite symmetric matrix which is related to 
the covariance matrix of the diffusion of water molecules at a particular voxel in the brain. In 
diffusion tensor imaging a number of different metrics have been proposed for comparing two 
tensors Si, S 2 , which are symmetric positive semi-definite mxm covariance matrices. Metrics 
include the Euclidean metric, Riemannian metric (Pennec et al., 2006), log-Euclidean metric 
(Arsigny et al., 2007), Cholesky metric (Wang et al., 2004), and the Procrustes size-and- shape 
metric (Dry den et al., 2009), among many others. In several of these papers it has been demon- 
strated how the choice of metric makes a substantial difference to estimation and interpolation. 
However, for a particular application at hand, which is the preferred choice of metric? 

Statistical approaches to the analysis of DT data include Schwartzman et al. (2010), Whitcher 
et al. (2007), Zhu et al. (2007, 2009), and, for example, hypothesis tests for group means and 
extensions of regression models are considered. However, the choice of statistical model or 
metric on the space of diffusion tensors is problem specific. 

The aim of this paper is to consider a family of metrics indexed by a single power parameter. We 
will develop likelihood-based methods for estimation of the power parameter, which will enable 
us to select the most appropriate metric from the family. The methodology is applied to simu- 
lated data and a real dataset of diffusion tensors from a sample of canine hearts. The procedure 
that is developed is directly analogous to the use of variance stabilizing Box-Cox transforma- 
tions that are very commonly used in regression analysis in statistics (Box and Cox, 1964). The 
general idea is that a suitable metric for a problem will be based on a power transformation that 
makes the data "as Gaussian as possible". 



2 Power Euclidean metrics 

2.1 Metrics and the sample Frechet mean 

Let S = UAU T be the usual spectral decomposition with U G 0(m) an orthogonal matrix and 
A diagonal with strictly positive entries. Let log A be a diagonal matrix with logarithm of the 
diagonal elements of A on the diagonal. The logarithm of S is given by log S — U (log A)U T 
and likewise the exponential of the matrix S is exp S = U (exp A)U T . The log-Euclidean metric 
(Arsigny et al., 2007) is given by 

d L (S 1 ,S 2 ) = \\\og(S 1 )-\og(S 2 )\\. (1) 
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A broad family of covariance matrix metrics is the set of power Euclidean metrics 

d A {S u S 2 ) = -\\S^-S^l (2) 
a 

where S a = UA a U T (see Dryden et al., 2009). As a — > the metric approaches the log- 
Euclidean metric of CD)- We could consider any a 6 R depending on the situation, and we 
use the log-Euclidean metric if a = 0. Note that all these metrics are Euclidean metrics, with 
curvature of the space being zero. Tangent space co-ordinates are also therefore very simple. 
If a = the covariance matrices must be strictly positive definite, but if a ^ then we can 
compare positive semi-definite matrices. 

Consider a random sample available {Si, . . . , S n } from a population. The sample Frechet mean 
based on metric d() is calculated by finding 

S = arginff:rf(^,S) 2 . 

^ i=i 

In our case with all the power metrics, the manifold is isomorphic to a convex subset of R p , and 
the existence and uniqueness of the mean follows immediately (Karcher, 1977; Le, 1995). 

The sample Frechet mean of the powered covariance matrices provides an estimate of the pop- 
ulation covariance matrix, and is given by 

{n 1 1 n 

£||Sf-A|| 2 =-£<??• 
i=i J n i=i 

For a > the estimators become more resistant to outliers for smaller a, and for larger a the 
the estimators become less resistant to outliers. For a < one is working with powers of the 
inverse covariance matrix. The resulting fractional anisotropy measure using the power metric 
© is given by 

x 1/2 

" i L i=l 1=1 ) 

and \ a = — YhLi A practical visualisation tool is to vary a in order for a neurologist to 
help interpret the white fibre tracts in the images (Dryden et al., 2009). 

A question of interest is what a should we take in practice? We develop a criterion which in- 
volves choosing a to make the data as multivariate Gaussian as possible, as statistical inference 
should be more straightforward in such cases. Such a criterion is similar to the Box-Cox trans- 
formation used in regression analysis, except in our case we are working with matrix powers of 
symmetric positive definite matrices. 
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2.2 Likelihood 



First of all we write down the probability density function for Lj = S", i = l,...,n, which is 
assumed to be multivariate Gaussian in the m(m + l)/2 distinct elements of Lj. The density of 

Li is 

f L {Li\ = ^ m ( m+1)/4 |S|~ 1/2 exp{-^vech(L — /i) T S' 1 vech(L— //)}, i = l,...,n, 

where the parameters are /x (a m(m + l)/2 vector) and £ (a m(m + l)/2 x m(m + l)/2 
symmetric positive definite matrix). Note that vech(A) is a vector of the m(m + l)/2 elements 
in the upper triangle of A. 

Our data though are given as so we transform from L to S using 5 = (aL) l ^ a , to give the 
density of S. The transformation can be carried out in three stages. We first transform from L 
to (U, \D a ), where L = MjD a U T is the usual spectral decomposition with U G 0(m) and 
D = diag(di, . . . , d m ) is diagonal with positive, distinct diagonal values. We then transform 
from ^D a to D and then finally we transform to S = UDU T . Using Fang and Zhang (1990, 
p24) the Jacobian of the first transformation is given by 

m—1 m -i 

Ji=fniu) n n -(<*?-<*?)> 
i=i j=i+i a 

where f n (U) = 1/Il"=i + an d E/(i) is the first i-dimensional principal minor of U. The 
second transformation has Jacobian 



j 2 = LK 



,jct— 1 

1=1 

and finally the third transformation has Jacobian 



-| m— 1 m 

■j^tjt^u n 



/n(^) i=l jiii — " 
Putting these togther the density of S given a is: 

i m m—1 m ^ja 

fs(S;a,^) = f L (-S a ;^)Udr 1 U U ' 

u i=i i=i j=i+l u i/ 

Hence, given a random sample Si,...,S n of diffusion tensors the log-likelihood of the param- 
eters (a, /J, E) is given by 

L(a,/i,S) = ^log/ 5 (^;«,/i,S). (3) 
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2.3 Statistical Inference 



Maximum likelihood estimation of the parameters in the model is straightforward if a is known, 
in which case // and £ are given by the usual Gaussian m.l.e.s of the sample mean and sample 
covariance matrices of vech(Lj), i — 1, . . . , n, where n > m(m + l)/2. 

Hence, a method for estimation is to evaluate the profile log-likelihood for a range of a (sub- 
stituting in the maximum likelihood estimators (m.l.e.'s) of //, £ given a). The Jacobian term 
is straightforward to evaluate if the eigenvalues of Si are sufficiently distinct. However, if some 
of them are close to each other then a Taylor series expansion is used for numerical evaluation. 
Also, a Taylor series expansion is used for a close to zero. In particular, writing fi = \ b /\ a — l 
then we have the Taylor Series expansion 



which can be used for computation when the eigenvalues become close. Also, the Taylor series 
expansion in terms of a is 



Ar 1 (log(l+/i)//i+alog(l+ / i)V(2/i) + a 2 log(l+ / i)V(6 / i)+a 3 log(l+ / i) 4 /(24/i)+0(a 4 )), 



which is useful for small \a\. 

In order to obtain confidence intervals for a we use Wilks' Theorem which states that, for large 
samples, 



where A is the likelihood ratio for testing H : a = a versus H x : a ^ a . In practice, an 
approximate 95% confidence interval is obtained by the values of a which have log-likelihood 
within 2 from the maximum. 




X k a -\1 + 1(« - I)// + \(a - l)(a - 2)^ 2 + l(a - l)(a - 2)(a - 3)^ 3 + 0(/i 4 )) 



-21ogA^x?, 



5 



3 Applications 



3.1 Simulation study 

We initially consider a simulation study with m = 3 and vech(X) are simulated from a multi- 
variate Gaussian 

vech(X) ~ N 6 (/j,a 2 I) , 

and then the symmetric matrix X is transformed to the matrix S = (a!) 1 /*. A random sample 
of S matrices are simulated for n v voxels, with n s samples at each voxel. Each of the n = 
n s n v simulated tensors are independent and identically distributed. We consider an example 
simulation where the true mean matrix is /i = diag(2, 1, 1), the true a = 0.3 and a 2 = 0.02 is 
the noise variance. 

For each simulated sample of size n the m.l.e. and confidence intervals for a are obtained, by 
evaluating the log-likelihood over a range of values for a. In our example the range of values for 
evaluation is [—0.1, 0.7] in steps of 0.02. The m.l.e. of a maximizes the log-likelihood ©, and 
the confidence interval is taken as the range of a within 2 of the maximum of the log-likelihood. 

From 1000 Monte Carlo simulations we count the number of times that the true a lies inside the 
estimated 95% confidence interval. The results are given in Tabled] We see that the coverage 
is low for small samples, with the confidence intervals being conservative. However, coverage 
is very good for n s n v > 20 here. 

INSERT TABLE 1 ABOUT HERE 

3.2 Canine Hearts 

The estimation of a and confidence intervals are also carried out for data on a subset of regis- 
tered DT images of canine hearts. The dataset consists of nine registered hearts, and is described 
by Peyrat et al. (2007). There are n s = 9 DT images which have been registered and then a 
mask is extracted. We take an equally spaced grid at intervals of 2mm in the mask and consider 
a neighbourhood of voxels within a ball of radius 0.7mm. We consider interior points where 
the sub-region contains at n v — 1 neighbourhood voxels in addition to the voxel itself (where 
n v > 15). Hence, we have n = n s n v voxels contributing to the likelihood of ©, with a com- 
mon fj,, £ and a. The parameter a is estimated by maximum likelihood using the voxels in each 
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local sub-region. Hence, we are able to estimate a in different regions of the heart. 

We consider analysis of the DT data using a global normalization (as the data have been mea- 
sured at different temperatures) and also without normalization. The DTs have been normalised 
for each heart by taking an overall mean tensor (using Euclidean mean a = 1) and then dividing 
by the norm of this average matrix. 

The results of the canine hearts analysis for the normalized data are displayed in Figures [Hand 
[21 We see that a is generally lower on the left of this region and higher on the right. Values are 
particularly high at the upper right in the view in Figure [2] In practice one would wish to use 
the procedure to choose a single value of a suitable for the whole image, perhaps rounded to 
the nearest half for ease of explanation. Regarding the smoother estimates of Figure \T\ values of 
a in the range 0.4 — 0.8 would not be unreasonable for this dataset, hence a suitable rounded 
choice would be a = 0.5, corresponding to the matrix square root. 

INSERT FIGURES 1 AND 2 ABOUT HERE 

In Figures [3HU we see the equivalent plots for the non-normalized data. Again values in the 
range 0.4 — 0.8 seem reasonable, although there is no longer a pattern of general increase from 
left to right. Rather the larger values are at the front/middle of the subregion. On average the 
estimates are a little lower than for the normalized data, but an overall rounded value of a = 0.5 
would be reasonable for these data as well. 

INSERT FIGURES 3 AND 4 ABOUT HERE 

Once a suitable value of a has been estimated then statistical analysis of the diffusion tensors 
can proceed. For example, we may wish to consider mean diffusion tensors in a region using the 
Frechet mean, we might want to carry out interpolation in space between tensors or alternatively 
consider fibre tracking using the metric corresponding to the estimated a. 



4 Further work 

Rather than working with the symmetric power matrices, we could match two power matrices 
closer by post multiplying by an orthogonal matrix: 

d P , a {S 1 ,S 2 )= inf hs?-S%R\\. (4) 

R€0(m) a 
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Note that S a is symmetric, but S a R is not symmetric in general. The motivation for carrying 
out this additional matching is that we can decompose S as 

S = ([S a RR T S a }) 1/{2a) = ([(S a R)(S a R) T ]) 1/{2a \ 

where RR T = R T R = I m , and so the covariance matrix S is invariant to the choice of R. 
The Euclidean power metric involves fixing R — I, with S a symmetric. The Procrustes so- 
lution for matching to S 1 ^ is R = UW T , where U, W are obtained from a singular value 
decomposition: 

(S%) T S? = WW T , U,W e 0(m), 

and ^ is a diagonal matrix of singular values. The resulting metric is equivalent to Kendall's 
Riemannian size-and-shape metric in the size-and-shape space of m + 1 points in m dimensions 
(see Dryden et al., 2009, when a = 1/2). This non-Euclidean space has positive curvature, and 
is a cone with a warped product metric (Kendall, 1989; Dryden and Mardia, 1998). However, 
for practical use and in some simulation studies the Proctustes metric and the Euclidean power 
metric often perform similarly (see Dryden et al., 2009 in the a = 1/2 case). Further properties 
and practical usage of the Procrustes power metric will be investigated in further work. 

From the application point of view, more experimental data would be need to clearly establish 
what is the best metric for inter-subject comparison. It would also be very interesting to compare 
with the optimal metric for the intra-subject measurement noise. This last metric could be 
estimated from repeated scan of the same subject. 
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n v 


n s 


Monte Carlo Coverage (%) 


2 


4 


72.5 


2 


5 


88.9 


3 


5 


93.8 


4 


5 


95.0 


5 


5 


94.4 


5 


10 


96.1 


10 


10 


95.7 


20 


20 


95.5 


20 


30 


95.8 



Table 1: Coverage from 1000 Monte Carlo simulations for different values of n v voxels and n s 
samples for the simulated data. The true coverage is 95%. 
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Figure 1: The m.l.e. of a in each of the 124 neighbourhoods. Each image has been normalised 
by dividing by the global norm of the average DT matrix. The location of the neighbourhoods 
proceeds from left of the mask (lower neighbourhood labels) to the right (higher numbered 
labels). A smoother estimate of the m.l.e. is given (middle line) together with smoothed upper 
and lower 95% confidence limits for each neighbourhood. 
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Figure 2: The mask is shown with the principal axis of each DT for the first canine heart. The 
coloured spheres show the estimated a at a regular grid of intervals of 2mm, with sphere radius 
0.7mm. Each image has been normalised by dividing by the global norm of the average DT 
matrix. Dark red indicated low a and light yellow indicates high a. There is a general increase 
in a when proceeding from left to right. 




20 40 60 80 100 120 

subregion 



Figure 3: The m.l.e. of a in each of the 124 neighbourhoods for the unnormalized data. The 
location of the neighbourhoods proceeds from left of the mask (lower neighbourhood labels) 
to the right (higher numbered labels). A smoother estimate of the m.l.e. is given (middle line) 
together with smoothed upper and lower 95% confidence limits for each neighbourhood. 
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Figure 4: The mask is shown with the principal axis of each DT for the first canine heart. The 
coloured spheres show the estimated a at a regular grid of intervals of 2mm, with sphere radius 
0.7mm, from the unnormalized data. Dark red indicated low a and light yellow indicates high 
a. 
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